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Transcription factor redundancy and tissue-specific 
regulation: Evidence from functional and physical 
network connectivity 

Steven G. Kuntz, 1 Brian A. Williams, 1 Paul W. Sternberg, 1,2,3 and Barbara J. Wold 1,3 

division of Biology, 2 Howard Hughes Medical Institute, California Institute of Technology, Pasadena, California 91 125, USA 

Two major transcriptional regulators of Caenorhabditis elegans bodywall muscle (BWM) differentiation, hlh-l and unc-120, are 
expressed in muscle where they are known to bind and regulate several well-studied muscle-specific genes. Simultaneously 
mutating both factors profoundly inhibits formation of contractile BWM. These observations were consistent with a simple 
network model in which the muscle regulatory factors drive tissue-specific transcription by binding selectively near 
muscle-specific targets to activate them. We tested this model by measuring the number, identity, and tissue-specificity of 
functional regulatory targets for each factor. Some joint regulatory targets [218) are BWM-specific and enriched for 
nearby HLH-1 binding. However, contrary to the simple model, the majority of genes regulated by one or both muscle 
factors are also expressed significantly in non-BWM tissues. We also mapped global factor occupancy by HLH-1, and 
created a genetic interaction map that identifies hlh-l collaborating transcription factors. HLH-1 binding did not predict 
proximate regulatory action overall, despite enrichment for binding among BWM-specific positive regulatory targets of 
hlh-l We conclude that these tissue-specific factors contribute much more broadly to the transcriptional output of muscle 
tissue than previously thought, offering a partial explanation for widespread HLH-1 occupancy. We also identify a novel 
regulatory connection between the BWM-specific hlh-l network and the hlh-8/twist nonstriated muscle network. Finally, 
our results suggest a molecular basis for synthetic lethality in which hlh-l and unc-120 mutant phenotypes are mutually 
buffered by joint additive regulation of essential target genes, with additional buffering suggested via newly identified hlh-l 
interacting factors. 

[Supplemental material is available for this article.] 



Gene networks that govern cell-type-specificity typically center 
around a few core transcription factors that interact directly, both 
physically and genetically, with "terminal differentiation" regula- 
tory target genes (Davidson 2007). For the muscle gene network, 
these core factors are evolutionarily conserved in vertebrates and 
invertebrates, consisting of bHLH factors of the MyoD family and 
members of the MADS family (Fukushige et al. 2006). Decades of 
detailed genetic and molecular studies of selected "model" muscle 
genes showed that core factors interact physically and functionally 
with their transcriptional enhancers and promoters. This led to 
a parsimonious working model in which core factor occupancy 
specified all muscle-restricted transcription and thus defined the 
terminal differentiation state. It is now possible to test this and to 
probe more deeply how the core regulators act individually, addi- 
tively, and/or synergistically on their targets. In principle, it is 
straightforward to build and compare a global physical map of 
factor occupancy determined by ChlP-seq (Johnson et al. 2007) 
with a corresponding perturbation map of factor function whose 
global output is measured by mRNA-seq (Mortazavi et al. 2008). 
Many differentiation systems now have good genomic maps of 
one kind but not the other due to various technical and biological 
limitations, but Caenorhabditis elegans bodywall muscle (BWM) is 
especially amenable to both kinds of mapping. In particular, its 
core BWM transcription factors, hlh-l and unc-120, comprise 
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a synthetic embryonic lethal pair. This permits each factor to be 
eliminated individually and the regulatory impact measured in 
muscle tissue, thus avoiding the problems of other systems in 
which mutation of a single factor eliminates the tissue entirely. 
This comparison can also address questions emerging from related 
systems, including mammalian myogenesis (Cao et al. 2010), in 
which factor occupancy maps are revealing much more pervasive 
physical occupancy across the genome than was initially expected. 

Nematodes have three distinct muscle regulatory networks 
that establish and maintain the differentiated states for their re- 
spective tissues: bodywall muscle, nonstriated muscles (NSM), and 
pharyngeal muscle (PhM). Each core network has a dedicated 
transcription factor (hlh-l in BWM, hlh-8 in NSM, ceh-22 in PhM) 
(Fig. 1A; Chen et al. 1992, 1994; Williams and Waterston 1994; 
Fukushige et al. 2006; Lei et al. 2009). These dedicated factors are 
joined by semidedicated factors expressed in multiple muscle types 
and muscle-associated cells (muscle-associated GLR cells, coelo- 
mocytes, and the contractile somatic gonad) but not in other tis- 
sues (unc-120 in both NSM and BWM) (Baugh et al. 2005a; 
Fukushige et al. 2006), and they are joined by more general factors 
that act in both nonmuscle and muscle tissues. 

BWM is functionally analogous to the skeletal muscle of 
vertebrates and insects (Albertson and Thomson 1976; Chen et al. 
1994; Fukushige et al. 2006), being the most prominent muscle in 
the animal by cell number and mass (81 embryonic and 14 post- 
embryonic BWM cells) (Sulston and Horvitz 1977; Sulston et al. 
1983). Five transcription factors are known to regulate BWM: hlh-l f 
unc-120, hnd-1, ceh-51, and fozi-1 (Fig. 1A; Harfe et al. 1998a; Mathies 
et al. 2003; Fukushige et al. 2006; Amin et al. 2007; Broitman-Maduro 
et al. 2009). Ectopic expression of some can convert early blastomeres 
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Figure 1. Experimental flow and muscle differentiation network. (A) The three types of nematode 
muscle and their associated transcription factors. (B) The experiments performed across three RNAi 
conditions (mex-3; skn-1; elt-1 RNAi, mex-3 RNAi, and no RNAi) in N2, hlh-1 (cc561), and unc-1 20(st364). 
(C) The embryonic development lineages with BWM cell number in normal embryos (red) are highlighted 
to show which lineages are enhanced under the RNAi conditions (mex-3 RNAi and mex-3; skn-1; elt-1 
RNAi). 



to muscle, based on myosin reporter assays, and hlh-1 is the most 
efficient, with all five up-regulating endogenous hlh-1 and unc-120 
(Fukushige and Krause 2005; Fukushige et al. 2006; Broitman- 
Maduro et al. 2009). unc-120 is the most critical hlh-1 collaborator, 
based on its synthetic lethality with hlh-1 (Baugh et al. 2005b) and 
its expression throughout BWM (Baugh et al. 2005a; Fukushige 
et al. 2006). In contrast, the other factors are confined to devel- 
opmentally early times of specification or very early differentiation 
and are restricted to subsets of BWM or are not specific to muscle 
(Baugh et al. 2003; Amin et al. 2007; Yanai et al. 2008; Broitman- 
Maduro et al. 2009). For these reasons, we consider hlh-1 and unc-120 
to be the core regulators for the differentiated BWM network. 

Detailed knowledge of connectivity between hlh-1 or unc-120 
and their downstream targets comes from studies of specific target 
genes, myo-3, unc-54, and pat-3, where binding of the factor was 
observed at a specific (mutable and essential) ds-regulatory module 
(CRM) (Francis and Waterston 1985; Fukushige et al. 2006; Lei et al. 
2009). These targets and their CRMs serve as internal standards 
for genomic assays in this work. Whether specific binding and 
action of HLH-1 and/or UNC-120 proteins regulates the hundreds 
of additional BWM genes has been untested, and the extent of 
individual versus shared connectivity is unknown. Recent studies 
that mapped HLH-1 protein occupancy across the entire genome 



(Gerstein et al. 2010; Lei et al. 2010) 
showed widespread binding with prox- 
imity to both muscle-specific and non- 
muscle genes, but those studies did not 
directly address the relationship be- 
tween binding and genome-wide regu- 
latory dependence on the transcription 
factors. 

Nonstriated muscles comprise a mi- 
nor fraction of C. elegans (four embry- 
onic, 16 post-embryonic muscles, and 10 
contractile somatic gonad sheath cells) 
(Sulston and Horvitz 1977; Sulston et al. 
1983). NSM uses hlh-8 as its major tran- 
scriptional regulator (Harfe et al. 1998b; 
Corsi et al. 2000; Liu and Fire 2000) 
along with unc-120 (Fukushige et al. 

2006) and, in a subset of the NSM, mls-1 
(Kostas and Fire 2002; Reece-Hoyes et al. 

2007) . Ectopic hlh-8 produces NSM phe- 
notypes in other cell types (Harfe et al. 
1998b; Wang et al. 2006; Zhao et al. 2007). 
hlh-8 and hlh-1 are transiently coexpressed 
in the post-embryonic M cell, whose 
progeny ultimately include 14 BWM cells 
expressing only HLH-1, 16 NSM cells 
expressing only hlh-8 f and two nonmuscle 
coelomocytes (Sulston and Horvitz 1977). 
The molecular and regulatory relationship 
of BWM and NSM networks is a second 
focus of this work, based on our findings 
of crosstalk between them. 

Here, we construct a C. elegans 
BWM genomic resource consisting of 
RNA-seq transcriptomes of the wild- 
type, hlh-1 mutant, and unc-120 mutant 
BWM plus ChlP-seq HLH-1 factor occu- 
pancy. Cell type classifications (BWM, 
general, or non-BWM) are assigned to 
regulatory target genes by comparing transcriptome measure- 
ments from BWM-enriched embryos and normal embryos. We 
then dissect unique and shared regulatory contributions from 
each factor by comparing transcriptomes of wild-type embryos 
with those of hlh-1 mutant and unc-120 mutant embryos. This 
reveals the regulatory influence of each factor on muscle-specific 
versus broadly expressed genes. We also provide a genetic re- 
source of previously unknown hlh-1 interacting factors identified 
via a synthetic RNAi screen. Finally, we measure the number, lo- 
cation, and DNA sequence motif composition of in vivo HLH-1 
bound regions to evaluate how biochemical factor binding is re- 
lated to regulatory impact (Fig. IB). 

Prior genetic studies showed that no single factor in the 
core BWM network is essential for muscle differentiation (Baugh 
et al. 2005b; Fukushige et al. 2006; Broitman-Maduro et al. 2009), 
suggesting there is partial "redundancy" between factors, although 
no specific molecular explanation was suggested. Among genes 
affected by hlh-1 mutation in our study, one coherent set of 
transcription factors includes hlh-8/twist, which is known to 
positively regulate NSM differentiation. We discuss how this 
finding, plus other properties of the hlh-l/unc-120 network, 
contributes to the tolerance of worm BWM myogenesis to hlh-1 
and unc-120 mutation. 
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Results 

Increasing muscle by respecification 
reduces nonmuscle background 

Our overall study design for genomic 
measurements is shown in Figure IB. We 
increased BWM in embryos by knocking 
down known specification genes for non- 
BWM lineages that act prior to hlh-1 and 
unc-120 (Fig. 1C). Only one-sixth of C. 
elegans normally becomes bodywall 
muscle (Sulston et al. 1983). This presents 
signal-to-noise problems for ChIP and 
transcriptome experiments by diluting 
signal and obscuring any signal's cell type 
source. Increasing the proportion of 
BWM can ameliorate these problems, but 
prior methods (Bowerman et al. 1992; 
Draper et al. 1996; Page et al. 1997; Baugh 
et al. 2005a) had specific disadvantages 
for our purposes (Methods). We increased 
muscle content without directly aug- 
menting the muscle network itself by 
respecifying nonmuscle fates to a muscle 
fate. RNAi knockdown of mex-3 can 
double muscle (Draper et al. 1996), while 
joint knockdown of mex-3, skn-1, and elt-1 
is expected to convert over 80% of cells 
to BWM. mex-3 acts three cell divisions 
before HLH-1 expression (Draper et al. 
1996; Hunter and Kenyon 1996), skn-1 
acts two or three cell divisions beforehand 
(Bowerman et al. 1992; Blackwell et al. 
1994), and elt-1 acts around the time hlh-1 
will be activated (Spieth et al. 1991; Page 
et al. 1997; Michaux et al. 2001) but still 
permits hlh-1 expression. 

We assayed three conditions: no RNAi 
(empty vector); mex-3 RNAi only; and elt-1, 
mex-3, and skn-1 triple RNAi. Since knock- 
ing down multiple genes via RNAi can 
significantly reduce the efficiency of each 
individual knockdown (Gonczy et al. 2000; 
Gouda et al. 2010), we concatenated RNAi 
coding sequences to produce a single 
transcript. As expected, muscle-specific 
transcripts such as tnt-2 and tnt-3 were 
enriched by the RNAi strategy across two 
biological replicates (Fig. 2A). Known 
nonmuscle genes, such as tnc-2, were re- 
duced with RNAi (Fig. 2B). Genes broadly 
expressed in both BWM and non-BWM, 
such as pat-10, were not significantly af- 
fected by RNAi (Fig. 2C). Muscle from the 
triple RNAi sample, unlike mex-3 alone, 
should be dominated by the C and D line- 
ages, at the expense of the MS lineage. The 
mex-3 RNAi condition, which doubles the 
BWM contribution compared with wild 
type, is included to retain MS-derived 
muscle for observation (Figs. 1C, 2F). 
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Figure 2. Mutation and RNAi-based muscle enrichment impact gene expression levels. (A) Muscle 
troponin T tnt-3 exemplifies genes enriched in muscle-enhanced embryos (red). Muscle-normal 
expression (black) is nonzero since these animals retain significant muscle. (B) Non-BWM troponin C 
tnc-2 exemplifies genes depleted in muscle-enriched embryos. BWM-enriched animals (red) have 
reduced non-BWM tissue, so little expression is seen. (G) Troponin pat- 1 0, which is expressed in both 
BWM and non-BWM, exhibits a negligible net change. (D) dhp-2 (dihydropyrimidinase) and (f ) lbp-3 
(lipid-binding) are both muscle-enriched, dhp-2 is affected by hlh-1 loss of function, and lbp-3 is 
affected by unc-120 loss-of-function, suggesting positive regulation by the BWM transcription fac- 
tors. (F) Group averages of the ratiometric change for the muscle-enriched genes (those significantly 
up-regulated in muscle-enriched animals), nonmuscle-enriched (those significantly down-regulated 
in muscle-enriched animals) genes, and annotated BWM genes (which significantly overlap with 
NSM and PhM genes and are, therefore, more broadly expressed) are plotted against the RNAi 
feeding conditions. (G) Overlap of hlh-1 and unc-120 regulated genes, both positively and negatively 
regulated, including the 441 genes positively regulated by both unc-120 and hlh-1 . The number of 
genes in the other intersects are listed. (H) BWM transcription factors regulate more broadly 
expressed RNAs than BWM-specific ones. A total of 1 1 39 genes whose RNA levels are significantly 
regulated by hlh-1 and/or unc-120 are expressed exclusively in BWM; hlh-1 and unc-120 significantly 
regulated 1068 and 2694 broadly expressed genes. (RPKM) Reads per kilobase of gene structure 
model per million reads. 
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RNA-seq reveals hlh-1 and unc-120 regulatory targets, many 
of which are shared 

We quantified transcriptomes from total polyA+ RNA using RNA- 
seq (Mortazavi et al. 2008). The ~400-min developmental time 
point (twofold through threefold stage at 25°C in wild-type ani- 
mals) was used to ensure that BWM cells had already been specified, 
thus capturing embryos during differentiation. Due to concerns 
over variations in timing across mutants and RNAi conditions, we 
verified that both hnd-1 and ceh-51 had shut down in all animals 
(RPKM < 2), indicating that our samples across replicates and con- 
ditions represent middle to late differentiation. To identify regula- 
tory targets of hlh-1 and unc-120, we compared wild type and tem- 
perature-sensitive hlh-l(cc561) and unc-120(st364) mutants cultured 
at the nonpermissive temperature, both with and without RNAi 
feeding. When the hlh-l(cc561) parent generations are elevated to 
the nonpermissive temperature prior to egg fertilization at the L4 
stage, no detectable HLH-1 remains in the resulting embryos, and 
there is no maternal or zygotic effect (Chen et al. 1994), as con- 
firmed here by immunoprecipitation (Methods). 

We defined hlh-1 and unc-120 regulatory targets as genes whose 
expression differs significantly (average expression ± one standard 
deviation) between wild-type embryos and the mutant hlh-1 (cc561) 
or unc-120(st364) under RNAi-treated conditions (mex-3 and mex-3; 
elt-1; skn-1 triple RNAi) (Fig. IB). Among 13,216 genes expressed 
above background, 1445 are hlh-1 regulatory targets by these rela- 
tively conservative criteria, and we expect them to include both direct 
and indirect mechanisms (Table 1). Of these, 837 decreased signifi- 
cantly (one standard deviation) in the mutant, consistent with a 
simple positive mechanism of action for HLH-1 (Lei et al. 2009; 
see Discussion). Conversely, 608 genes were up-regulated by loss of 
HLH-1. These are explained most simply by indirect negative mech- 
anisms, although the mammalian hlh-1 ortholog, MyoD, can act as a 
direct negative factor at a few target genes (Berkes et al. 2004; Dilworth 
et al. 2004; Penn et al. 2004). Mutation of unc-120 significantly re- 
duced RNA levels from 2718 genes in muscle-enhanced embryos 
(Table 1), while 956 genes had significantly higher transcript levels. 

Among 592 transcription factor genes detectably expressed in 
our samples, 22 were up-regulated and 12 down-regulated in hlh-1 
mutant embryos, while 13 were up-regulated and 103 down- 
regulated in unc-120 mutants (Supplemental Table SI), tbp-1 and 
nhr-63, regulated by hlh-1, also interact with hlh-1 in an RNAi 



synthetic lethal screen (see below), making them especially strong 
candidate members of a more complete BWM transcription network 
(Discussion), hlh-1 has a proportionally more negative effect than 
unc-120 on transcription factors, which may reflect their differing 
roles in BWM and NSM. A specific and unexpected example of hlh-1 
negative regulation was hlh-8. Because hlh-8 is the major positive 
transcriptional regulator of NSM (Corsi et al. 2000, 2002; Liu and 
Fire 2000), this observation suggests a previously unknown regula- 
tory connection between the BWM and NSM networks. In the unc- 
120 mutants, the transcript level of unc-120 itself was up-regulated, 
suggesting that there is a negative autoregulatory feedback loop. 
unc-120 was not, however, significantly affected by hlh-1 mutation. 
This result differs from a prior report of unc-120 regulation by hlh-1 
(Yanai et al. 2008), although that study was performed at earlier 
developmental time-points during specification and used a different 
significance criteria and measurement technology (RT-PCR). 

Seven hundred and sixty genes were significantly regulated 
by both unc-120 and hlh-1 (P < 0.001 for joint regulation), thus 
offering an explanation for these genes' continued though di- 
minished expression when either factor is mutated. Four hundred 
and forty-one of these genes were positively regulated by both 
factors and 144 were jointly negatively regulated (Table 1; Fig. 2G). 
An additional 175 genes were divergently regulated between the 
two factors. The negative and mixed groups include genes best 
explained by differential regulation between BWM and NSM reg- 
ulatory networks (Discussion). 

hlh-1 and unc-120 regulate many BWM and non-BWM 
muscle genes 

We next asked how hlh-1 and unc-120 regulatory target status is 
parsed among genes expressed preferentially in BWM, genes 
expressed in both BWM and other tissues, and genes expressed 
exclusively elsewhere. The last group serves as a measure of noncell 
autonomous effects and background. Bodywall muscle-enriched 
genes were defined by comparing RNA-seq of N2 control (no RNAi) 
embryos with RNAi-treated muscle-enriched animals. Overall, 2058 
genes had expression levels significantly higher (one standard de- 
viation) in BWM-enriched worms than in control, including many 
classical markers of BWM (Supplemental Table S2). From a pre- 
viously described set of muscle structural genes (Fox et al. 2008), our 
set included 20 of their 38 genes, with the discordant 18 being 



Table 1. Impact of hlh-1 mutation on expression levels in BWM-enriched worms 



Low in Absent in 

Regulation Muscle-specific 3 Widespread 13 muscle 0 muscle d Total 



hlh-1 and unc-120 positively regulated 


126 


195 


120 


0 


441 


unc-120 only positively regulated 


667 


1157 


358 


0 


2182 


hlh-1 only positively regulated 


66 


151 


99 


0 


316 


Expressed but unchanged 


919 


3839 


814 


1157 


6729 


hlh-1 only negatively regulated 


93 


183 


43 


50 


369 


unc-120 only negatively regulated 


95 


470 


130 


37 


732 


hlh-1 and unc-120 negatively regulated 


38 


71 


28 


7 


144 


hlh-1 positively regulated and unc-120 negatively regulated 


20 


25 


35 


0 


80 


unc-120 positively regulated and hlh-1 negatively regulated 


34 


38 


23 


0 


95 


Total 


2058 


6129 


1650 


1251 


11,088 


% positively regulated by hlh-1 


10% 


6.1% 


15% 


0% 


7.5% 


% positively regulated by unc-120 


40% 


23% 


30% 


0% 


25% 



a Genes whose expression is significantly higher in muscle-enriched animals. 
b Genes with similar expression levels with and without RNAi. 
c Genes expressed less, though still present, in muscle-enriched animals. 
d Genes not expressed in muscle-enriched animals. 
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explained by their known expression in pharynx or other tissues. 
Among 2901 genes expressed preferentially in nonmuscle tissues 
(Table 1), 1251 appeared entirely restricted to nonmuscle. The true 
number of genes with this "nonmuscle" pattern is almost certainly 
higher because our assay is not sensitive to genes expressed at low 
levels in only a few cells, and there are many such genes in C. elegans. 

As expected, some classic BWM differentiation genes were 
among the most strongly down-regulated by hlh-l loss (Fig. 2D) or 
unc-120 loss (Fig. 2E). RNA levels for these genes decreased signifi- 
cantly (>one standard deviation), but none lost all detectable RNA 
(see Discussion). At the other extreme, a different subgroup of BWM 
genes, including tni-1 and major actins and myosins, were un- 
affected by either hlh-l or unc-120 mutation (Supplemental Fig. SI; 
Supplemental Table S2). Both unc-120 and hlh-l contribute widely 
to BWM and non-BWM exclusive genes (Fig. 2H). Overall, known 
BWM genes displayed a broad range of quantitative responses to 
hlh-l and unc-120 mutation, in both fractional and absolute change. 
This suggests different regulatory strength contributions from them 
and, implicitly, from additional transcription factors interacting 
with subsets of target genes (Supplemental Fig. SI). Of 2058 genes 
preferentially expressed in muscle, 10% (212) depended signifi- 
cantly on hlh-l for expression and 40% (827) depended significantly 
on unc-120 (Table 1). Of all hlh-l positive regulatory targets, 212 are 
muscle-preferred, 371 widespread, and 254 depleted but present in 
BWM. Likewise, unc-120 positive regulatory targets are distributed 
with 827 being muscle-preferred, 1390 widely expressed, and 501 
being depleted but present in BWM. 

hlh-l, but not unc-120, negatively regulates some NSM genes 

An unexpected result was that NSM annotated genes were prom- 
inent among the group of 307 genes up-regulated specifically in 
the BWM-enriched hlh-l mutant animals and not in wild- type 
animals (Table 2). Prominent in this group was hlh-8, the central 



Table 2. Genes up-regulated in the hlh-l mutant muscle- 
enhanced worms known to be expressed in wild-type NSM cells 



Gene 


Description 


B0336.3 


RNA recognition 


ags-3 


G protein signaling 


arr-1 


beta-arrestin 


C03H5.2 


UDP transporter 


ced-1 


Lipoprotein receptor 


cts-1 


Citrate synthase 


dpy-23 


Adaptin 


dsc-1 


Defecation suppressor 


egl-20 


WNT, signaling protein 


exp-1 


GABA receptor 


F47B7.2 


Sulfhydryl oxidase 


H28016.1 


ATP synthase 


hlh-8 


TWIST, transcription factor 


mls-1 


TBX1, transcription factor 


mrp-2 


Multidrug resistance protein 


mua-6 


Intermediate filament 


mup-4 


Muscle junctions 


nlp-13 


Neuropeptide 


nmy-1 


Nonmuscle myosin 


ppk-3 


PIP kinase 


rom-1 


Rhomboid related 


shc-1 


Signaling (src, jnk, insulin) 


snb-1 


Synaptic vesicle 


trs-1 


tRNA synthetase 


uvt-3 


Pantothenate kinase 


ZK112.3 


Unknown 



regulator in the NSM differentiation network (Corsi et al. 2000), 
and mls-1, another transcription factor in the NSM network 
(Kostas and Fire 2002). These findings were unexpected since tri- 
ple-RNAi treatment in wild-type embryos abolished, or at least 
reduced, the entire NSM, including enteric muscles and the M cell 
lineage. This suggests that a subnetwork of BWM genes behaves 
differently than the rest of the tissue and that genes in this group 
are candidate regulatory targets of hlh-8 and/or mls-1. Overall, 
9.5% (195 genes) of the BWM preferred expression group were 
annotated in WormBase as also expressed in normal NSM, and, of 
these, 26 were up-regulated along with hlh-8 and mls-1. BWM/ 
NSM shared genes showed significant overlap with hlh-l positively 
regulated target genes (hypergeometric P < 0.001). Ninety-five 
percent of NSM genes, including hlh-8 and mls-1, were not 
detectably elevated in unc-120 mutants, meaning it is likely that 
repression of NSM circuitry is specific to hlh-l. Though unaffected 
in BWM-enhanced wild-type embryos, hlh-8 was positively regu- 
lated by unc-120 in muscle-normal animals, presumably by acting 
in the NSM (which is absent in the BWM-enhanced condition). 

Synthetic PAT screen for coregulators of hlh-i and mediators 
of hlh-8/hlh-t crosstalk 

As shown above, unc-120 partly explains the robustness of worm 
myogenesis to hlh-l mutation, but other factors might perform 
a similar function for additional hlh-l targets. To find other regu- 
lators that collaborate with hlh-l, we performed a feeding RNAi 
synthetic paralysis-at-twofold (Pat; WBPhenotype:0000053) phe- 
notype analysis in the hlh-l(cc561) mutant background, using 
a library of 512 genes that encode known and suspected tran- 
scription factors. In nematodes, elongation of the embryo depends 
on muscle contractions (Williams and Waterston 1994). The Pat 
phenotype, therefore, serves as a readily scored surrogate for major 
BWM failure. As expected, unc-120 scored strongly in this assay. 
Other strong interactors included ceh-20, grh-1, thp-1, lin-26, pos-1, 
oma-2, nhr-4, nhr-46, nhr-63, nhr-116, hmg-1.2, hnd-1, and ceh-Sl 
(Supplemental Table S5). The Pat phenotype suggests that each of 
these contributes to expression of one or more genes needed for 
differentiation of muscle in the absence of hlh-l. 

A majority of these hlh-l genetic interacting factors are 
themselves regulated by hlh-l and/or unc-120. TATA-binding pro- 
tein (thp-1) is part of the transcription initiation complex and is 
positively regulated by both hlh-l and unc-120, suggesting elevated 
demand for it by some muscle differentiation genes, nhr-63 is 
negatively regulated by hlh-l, while nhr-116 is negatively regulated 
by unc-120. lin-26, pos-1, oma-2, nhr-4, and nhr-46 are positively 
regulated by unc-120, suggesting feed-forward loops that are fa- 
miliar structures in developmental circuits, nhr-63, grh-1, and ceh- 
20 are normally expressed in NSM, so they are candidates for genes 
that could interact with both hlh-l and the hlh-8/mls-l circuitry. 

Genes positively regulated by hlh-i and unc-120 are enriched 
for HLH-1 occupancy, which is widespread 

To map sites of HLH-1 occupancy in vivo, we performed chromatin 
immunoprecipitation from RNAi fed wild-type embryos (both 
mex-3 and triple RNAi) with an anti-HLH-1 antibody, followed by 
DNA sequencing (ChlP-seq) (Johnson e t al. 2007; Zhong et al. 
2010). The most prominent signals were consistent in all condi- 
tions (Supplemental Fig. S2), but BWM-enrichment was important 
for detecting the majority of HLH-1 ChIP signals. We evaluated 
ChlP-seq signal intensities and locations relative to background 
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(Pepke et al. 2009) to produce a high confidence set of 1047 peaks 
appearing in both RNAi conditions and a more inclusive union set 
of 9415 peaks appearing in either RNAi condition (mex-3 RNAi 
yielded 7021 peaks and triple RNAi yielded 3441 peaks; examples 
in Fig. 3A,B). The peak yield was similar to that of Lei et al. (2010), 
who recovered 20,143 peaks in their ChlP-seq experiment that 
they narrowed to 4016 high-confidence peaks (Lei et al. 2010). 
Their use of a different antibody likely accounts for much of the 
difference in peak identification, though the overlap was statisti- 
cally significant (P < 0.001). For their ChlP-chip analysis, they used 
the same antibody but a different enrichment and detection 
technique (Lei et al. 2010), leading to a similar sample size as our 
high-confidence set of peaks and a statistically significant overlap 
(P < 0.001). The muscle enrichment and regulatory dependence 
of genes near peaks from both broad and stringent sets were 
comparable, but the stringent group was more strongly 
enriched. Thus, >50% and 20% of our hlh-1 positively regulated 
gene list was captured in the broad and stringent HLH-l-bound 
sets, respectively. 
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Figure 3. HLH-1 ChlP-seq binding is associated with, but not predictive of, regulation. HLH-1 binds 
to the genes (A) dhp-2 and (B) lin-25 (arrows). (C) Venn diagram shows four criteria for rating in- 
teractions of BWM preferred expression (red circle, 21 75 genes), hlh-1 regulation of expression (blue 
circle, 757 genes), HLH-1 ChlP-seq binding (green circle, 9519 genes), and the presence of a local HLH-1 
binding motif (yellow circle, 3469 genes). The intersect of 78 genes is highlighted as the "Archetypal 
Muscle List." (D) Gene expression (RPKM levels from RNA-seq) levels illustrate different regulatory de- 
pendency patterns for hlh-1 and unc-120, with (upper) or without (lower) detectable nearby HLH-1 
occupancy, dhp-2 (gold from panel C) represents archetypal muscle genes, positively regulated by hlh- 1 
with significant HLH-1 occupancy, lin-25 (green, from C) is positively regulated by unc-120 but not 
hlh-1 , even though it has HLH-1 occupancy, skr-2 (blue, from C) is positively regulated by both hlh-1 and 
unc-120. hlh-8 (black) represents a class up-regulated only in hlhA mutant BWM, suggesting indirect 
negative regulation. 



Eighty-nine percent of the stringent set of HLH-1 -occupied 
regions were within 5 kb 5 '-ward of an annotated gene start (in- 
cluding regions that also fell within an upstream gene), with 36% 
of those concentrated in the proximal 500 bp. Sixteen percent of 
regions were in introns, 6.6% in exons, and only 1.2% in 3' UTRs 
(Supplemental Table S3). The 5-kb 5 '-ward, 500-bp proximal, and 
exon sequences were enriched genome-wide for HLH-1 ChIP peaks 
(P < 0.01), while other regions were depleted or not enriched (peaks 
per kb). 

Because hlh-1 is a highly cell-type-specific activating tran- 
scription factor, an initial expectation was that most BWM- 
specific genes would have one or more adjacent HLH-1 ChIP 
regions. For specific CRMs and promoters previously shown to 
drive BWM expression (Okkema et al. 1993; Krause et al. 1994), 
this was true, with ChIP signals at expected locations (Supple- 
mental Fig. S2). Genome-wide, 59.7% of the 941 annotated 
BWM genes had HLH-1 occupancy (broad set) within the gene 
body or 5 kb upstream (Supplemental Table S4), while 54% of 
our BWM-enriched expression gene set did. Sixty-seven percent 
of genes near a stringent HLH-1 peak 
(i.e., within 5 kb of the start site) are 
expressed at a significant level (RPKM > 3) 
in BWM (P < 0.001). However, the vast 
majority (80% in the stringent set and 
87% in the broad set) were not muscle- 
preferred in their expression pattern 
(Fig. 3C; Discussion). Rather, the major- 
ity are expressed widely in muscle and 
nonmuscle tissue. 

Genes whose expression depended 
positively on hlh-1 (Table 1; dhp-2 in Fig. 
3D) were significantly, but not strikingly, 
enriched for HLH-1 occupancy within 
5 kb upstream or in the gene body, 
compared with other genes in the ge- 
nome (57% vs. 49%, P < 0.001) (Supple- 
mental Table S4), while negatively regu- 
lated targets were not enriched (48% vs. 
49%; hlh-8 in Fig. 3D). The overlap was 
on par with that of Lei et al. (2010), with 
our high-confidence peaks and their 
ChlP-chip analysis both yielding 5% of 
the occupied genes depending on hlh-1, 
while their ChlP-seq analysis yielded 9% 
of occupied genes depending on hlh-1 to 
our 10% for our broader data set. Genes 
depending on unc-120 (Table 1) were also 
more likely to have HLH-1 occupancy 
than the rest of the genome (53% vs. 
49%, P< 0.001) (lin-25 in Fig. 3D). Genes 
jointly up-regulated by both hlh-1 and 
unc-120 function were similarly enriched 
for HLH-1 occupancy (54% vs. 49%, 
P < 0.002), though it is not required (skr-2 
in Fig. 3D). 

hlh-8, mls-1, and grl-26 were among 
608 genes under negative regulation by 
hlh-1 but not by direct binding, accord- 
ing to the ChIP data (Fig. 3D), hlh-8, in 
particular, had no HLH-1 binding in its 
gene body nor within 20 kb upstream of 
the TSS or 10 kb downstream. 
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ChlP-regions contain the canonical HLH-1 binding motif 
and novel associated motifs 

HLH-1 -occupied regions were used to derive overrepresented se- 
quence motifs. We expected to identify motifs responsible for di- 
rect HLH-1 binding, together with possibly collaborating motifs, 
since the latter are often present in functional ds-regulatory mod- 
ules (Davidson 2007). Two substantially different motif discovery 
algorithms found similar motifs (Methods). The primary motif was 
AACAGCTG (Fig. 4A, first motif), which is an E-box family motif 
(CANNTG is the known general motif for the bHLH family). The 
core hexamer matches previous HLH-1 motif determinations from 
yeast one-hybrid assays (Grove et al. 2009), in vivo ChlP-chip and 
ChlP-seq (Lei et al. 2010), and mammalian MyoD (Cao et al. 2010), 
while the adjacent AA produces a more specific site which is anal- 
ogous, but not identical, to the most highly preferred myogenin 
binding octa-E-box in mouse (CAGCTGRR) (Cao et al. 2010; 
A Kirilusha and B Wold, in prep.). Six other motifs (Fig. 4 A; Sup- 
plemental Fig. S3) plus GA- and CT-simple repeat-rich regions 
(Guhathakurta et al. 2002; GuhaThakurta et al. 2004) were found. 
Expanding the search radius from 50 bp to 100 bp found a second 
E-box: CAACTG (web logo not shown), reported previously as a 
secondary site for HLH-1 binding (Grove et al. 2009; Lei et al. 2009). 

Analyzed across all HLH-1 ChIP regions with a 250-bp radius, 
the two E-boxes and the GAGACGCAGA motif (Fig. 4A, second 
motif), for which there is no known factor, were strongly centered 
near ChlP-seq summits, with the most statistically significant cen- 
tral concentration being in muscle-specific and unc-120 positively 
regulated genes (P < 0.05; Fig. 4E; Supplemental Fig. S4). The cen- 
tered position argues that a motif is partly or solely responsible for 
the observed ChIP signal. Other motifs were more evenly distrib- 
uted in the tested regions (Fig. 4B; Supplemental Fig. S3), consistent 
with accessory or independent roles. As expected, the HLH-1 octa- 
box motif was most highly overrepresented among genes expressed 
preferentially in muscle (Fig. 4E). 

HLH-1 ChlP-seq peaks near specific functional subsets of 
genes were analyzed for motif discovery, position, and frequency. 
Gene groups tested were those (1) strongly positively dependent 
on hlh-l for expression, (2) strongly negatively regulated by hlh-l, 
(3) absent in bodywall muscle, (4) dependent on unc-120 for ex- 
pression, (5) less stringently dependent on hlh-l for expression, 
and (6) dependent on both hlh-l and unc-120. Motifs identified 
above were rediscovered within some subsets, in addition to two 
novel candidates: AAAANNNNNAAA and GCCGATTTGCCG (Fig. 
4A, third motif; Supplemental Fig. S3, sixth motif). The GCCGAT 
TTGCCG motif was specifically associated with genes that do not 
positively depend on hlh-l and with genes that do depend on unc-120. 
In fact, this motif was selectively depleted from the positively regulated 
HLH-1 gene set (P< 0.01). 

HLH-l-bound regions are preferentially conserved 

If HLH-1 -occupied E-box motif instances located near hlh-l regu- 
lated genes are functionally significant, we expect them to be 
preferentially conserved in evolution. Moreover, we expect func- 
tional HLH-1 binding sites to be embedded in larger domains of 
conservation that typify ds-regulatory modules. This was the case 
around our set of HLH-1 -occupied sites, with preferential conser- 
vation among sequenced nematodes of ±200 bp (Fig. 4D). This 
conservation was not restricted to sites near HLH-1 regulated 
genes; rather, the larger set of HLH-1 ChIP regions located near 
genes that were not regulated by hlh-l or were not BWM-specific 
displayed similar preferential conservation (Supplemental Fig. 



S3B). This suggests that HLH-1 ChIP signals overall identify func- 
tionally important sequences, but that these need not be adjacent 
to muscle-specific or HLH-1 -dependent genes. Among the new can- 
didate motifs, three others show preferential conservation, while 
GCCGATTTGCCG did not (Supplemental Fig. S3). 

Discussion 

Through analysis of the BWM differentiation network, we un- 
covered a significant overlap in transcription factor function, 
helping to explain the redundancy of the core factors, and a sur- 
prising lack of muscle factor target specificity to BWM. This work 
expanded loss-of-function analysis for BWM regulation to cover 
the entire transcriptome, finding that 21% (4359) of C. elegans 
genes are significantly affected by mutation of either hlh-l or unc- 
120. It was found that 3.7% (760) were affected by both regulators 
(Table 1), and these were highly enriched for BWM annotation. 
Their pattern of regulatory dependence helps to explain how and 
why hlh-l and unc-120 act as a synthetic lethal pair in the embryo. 
However, an equally strong result was that 71% of jointly regulated 
genes and 79% of HLH-1 or 69% of UNC-120 single targets are not 
tissue-specific — rather, they are substantially regulated by the mus- 
cle-specific factors in BWM and, presumably, by other unknown 
regulators in nonmuscle tissues. By integrating our mapping of 
regulatory connectivity with in vivo physical HLH-1 occupancy, we 
were able to define a set of "archetypal" direct transcriptional targets 
for hlh-l (Fig. 3C and below); identify biologically pertinent indirect 
regulatory relationships, including the major NSM-specific regulator, 
hlh-8 (Fig. 5A,B,C; below); and define a set of HLH-1 occupancy sites 
located near broadly expressed genes. Since a significant number of 
broadly expressed HLH-1 -occupied loci were functionally affected by 
hlh-l mutation (220 genes), we conclude that hlh-l either originated 
as a highly muscle-specific factor that has been drafted over time to 
help regulate widely expressed target genes in the specific context of 
muscle, or that hlh-l was originally a more general factor whose role 
was narrowed to muscle tissue early in animal evolution. New hlh-l 
collaborating factors expand the BWM differentiation regulatory 
network and network orthology based on an RNAi screen for muscle 
failure. 

unc-120/hlh-l compensation is based on overlapping roles 
in regulatory target control 

In nematodes, myogenesis is robust to mutation of either unc-120 
or hlh-l (Baugh and Hunter 2006; Fukushige et al. 2006); by con- 
trast, the hlh-l bHLH ortholog, myogenin, is absolutely required 
for mammalian differentiation, and, in Drosophila, the unc-120 
MADS family ortholog, Mef2 (also known as D-MEF2), is absolutely 
required (Black and Olson 1998). Nevertheless, the common 
theme is that bodywall or skeletal muscle differentiation in all 
three phyla uses both bHLH and MADS regulators. Our results help 
to explain the C. elegans network's unique behavior in three ways. 

First, 760 genes are jointly regulated (Table 1), and the regula- 
tory contributions from the two factors are roughly additive rather 
than highly synergistic. Significant residual expression (>30%) was 
observed in each mutant strain for the vast majority of shared posi- 
tively regulated targets (89%), and only one gene (tag-10) lost more 
than 90% of its expression (Supplemental Fig. SIC). Though all these 
numbers are sensitive to thresholds, the qualitative results remained 
unchanged even when the stringency was significantly increased. 
The troponin gene family, several of whose members have been 
studied individually, nicely illustrates the varied regulatory specificity 
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Figure 4. HLH-1 -associated motifs correlate with directional expression control. (A) The web logo position-specific frequency matrix (PSFM) diagrams 
for three representative motifs and the accompanying number of sites identified near HLH-1 occupancy (250-bp radius). (B) The relative locations of three 
motifs compared to their experimentally identified binding sites (analyzed per Ozdemir et al. 201 1 ). The AACAGCTG motif is centered on the called ChlP- 
seq peak (50% within ±25 bp of the peak for hlh- 1 positively regulated or muscle-enriched genes). The GAGACGCAGA motif (second panel) is less central 
(within 75 bp). The GCCGatttGCCG motif (third panel) shows no significant centrality. The gray line represents a uniform distribution. (C) The occurrence 
of each motif within ±250 bp of the HLH-1 occupancy peak near genes (ChIP regions within 5 kb of a geneTSS) belonging to expression groups is shown. 
The E-box shows the greatest enrichment for genes characterized as hlh-1 positively regulated (first panel). The GAGACGCAGA motif is more closely 
associated with unc-120 positively regulated genes (second panel), whereas the GCCGatttGCCG motif is enriched near genes absent in BWM (third 
panel). (D) The conservation across sequenced nematodes (elegans, briggsae, remanei, and brenneri) of ChlP-seq identified regions with the three motifs is 
shown. Conservation around the in vivo binding (blue) and around the motif (red) is shown compared to background (light blue and pink) (Ozdemir et al. 
201 1), with higher values representing a higher level of conservation. The E-box and GAGACGCAGA motifs, along with their surrounding sequences, are 
strongly conserved, while the GCCGatttGCCG motif is not at all conserved, (f ) Heat maps show the level of motif enrichment (yellow) or depletion (blue) 
for the CAgCTGtt, GAGACGCAGA, and GCCGatttGCCG motifs near broadly expressed genes that are similarly regulated (/-axis). The E-box is enriched 
near genes positively regulated by hlh-1 and unc-120. The GAGACGCAGA motif is enriched near genes negatively regulated by hlh-1 and positively 
regulated by unc-120. The GCCGatttGCCG motif is depleted near genes positively regulated by either factor. (F) There are four classes of E-boxes 
observed: Class I contains muscle E-boxes that are bound by HLH-1 , and it is predicted that mutation of these sites will lead to changes in expression, as the 
nearby genes are both specific to BWM and regulated (positively or negatively, in contrast to the Archetypal Genes, which are exclusively positively 
regulated) by hlh-1; Class II contains E-boxes that are similarly functional but are near genes not exclusively expressed in BWM; Class III contains E-boxes 
that are not required for expression but likely make contributions to nearby genes that are expressed exclusively in BWM; and Class IV contains seemingly 
nonfunctional E-boxes that are not required for expression or associated with BWM expression. 
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Figure 5. M?-7 negatively regulates the NSM transcription factor (4) Differential splicing of 

different isoforms of eg/-75 (o and b) is M7- 7 -regulated in BWM. Splicing isoform b, specific to BWM, is 
unaffected by the mutation, while isoform a, specific to NSM, is up-regulated specifically in the mutant 
muscle. (B) HLH-1 binding sites local to the genes mab-5, unc-62, and sup-12 are shown, with arrows 
representing the ChlP-seq peaks. Each gene contains an E-box motif characteristic of hlh-1 occupancy. 
sup-12 is the only one whose expression depends on hlh-1, but the other genes may be regulated in part 
by hlh- 1 . (C) sup- 12 depends on HLH-1 binding for expression (blue arrow). The mRNA-binding protein 
SUP-1 2 inhibits the splicing variant EGL-1 5a (Kuroyanagi et al. 2007). In the absence of hlh-1, grh-1 is 
up-regulated and may be controlled (dashed line) by egl-15a (Zhong and Sternberg 2006) to regulate 
mab-5 (Venkatesan et al. 2003). In turn, MAB-5 competes with LIN-39 to interact with CEH-20 and 
UNC-62 in some cells to effect target expression or repression (Liu et al. 2006; Jiang et al. 2009; Potts 
et al. 2009). They may act on hlh-8, which is known in some cells to depend on ceh-20 and mab-5 
(Kenyon 1 986; Liu etal. 2006; Jiang etal. 2009). Therefore, in the absence of hlh-1, sup- 1 2 decreases — leading 
to an increase in egl-15a and grh-1 . GRH-1 and EGL-1 5a work to activate the MAB-5/UNC-62/CEH-20 
Hox/Pbx complex to up-regulate the normally repressed hlh-8. This pathway is supported by the ap- 
pearance of grh-1, ceh-20, and lin-39 (shown in green) in the synthetic PAT screen as being integral for 
muscle formation. 



regulation. By several criteria, the most 
impressive of these is GAGACGCAGA, 
which is present in 1143 of 9447 HLH- 
1 -bound regions, is most highly enriched 
in BWM-expressed genes, is preferentially 
conserved, and is centrally concentrated 
near HLH-1 ChlP-seq regions. The motif 
is enriched near genes that are positively 
regulated by unc-120 and negatively 
regulated by hlh-1. Together, these facts 
argue that it binds a significant collabo- 
rating factor. 

Third, hlh-8 and mls-1 were among 
the genes strongly up-regulated in hlh-1 
mutants, suggesting that a small and 
specific subgroup of NSM muscle genes 
contribute to what is otherwise BWM 
myogenesis if hlh-1 is gone. Among hlh-1 
interacting factors identified in the RNAi 
screen, hmg-1.2 is also expressed nor- 
mally in NSM and could function as part 
of this intersecting circuit. Target genes 
that are divergently regulated by unc-120 
and hlh-1 might be explained as addi- 
tional genes normally necessary in NSM 
(a domain of unc-120 regulation) but not 
in BWM. 

Crosstalk between NSM and BWM 
regulators without wholesale 
tissue conversion 



across multiple muscle types as well as in nonmuscle tissue (Sup- 
plemental Table S6). unc-120's large pool of positive targets may be 
partly explained by a broader activating role in both BWM and 
NSM, rather than just one muscle type, meaning that genes expressed 
in both may be regulated primarily by unc-120. hlh-1 's smaller pool of 
regulatory targets and proportionally larger role in repression may 
reflect its narrow and specific function in BWM alone. This appears to 
include the capacity to silence the NSM network and potentially other 
nonmuscle gene networks. We cannot say yet how joint regulation is 
encoded in the ds-regulatory DNA of most target genes, but the 
specific list of HLH-1 -occupied candidate CRMs belonging to jointly 
regulated worm genes we produce (below) provides the field with 
hundreds of specific starting points for direct tests via transgenic 
assays. 

Second, newly identified hlh-1 interacting DNA binding fac- 
tors (CEH-20, NHR-63, GRH-1, HMG-1.2, and LIN-39) are strong 
candidates to explain how important muscle genes, especially those 
with no unc-120 response, can continue to be expressed without 
hlh-1. Whether they act on many genes or only on small specific 
subsets will now be testable by performing experiments like those 
done above for hlh-1 and unc-120. Though the DNA binding motifs 
for these new hlh-1 interactors are unknown to us, our motif discovery 
analysis of hlh-1 -bound regions produced candidates for combinatoric 



The finding that hlh-8 and mls-1 are 
strongly up-regulated by hlh-1 mutation 
raises several questions. At the tissue level, 
does up-regulation of hlh-8 and mls-1 pro- 
duce a wholesale transformation of BWM 
into NSM? It appears not, since many 
BWM-specific genes were readily detected 
in hlh-1 mutant RNA. Of 104 genes annotated as expressed in wild- 
type BWM but not in NSM, all continued to be expressed signifi- 
cantly in hlh-1 mutant embryos, with or without RNAi feeding. At 
the network level, are muscle genes normally expressed in both NSM 
and BWM similarly expressed in hlh-1 mutants, as would be ex- 
pected if they are primarily positively regulated by unc-120, hlh-8, 
and/or mls-1? Indeed, the vast majority (552 of 596) of genes an- 
notated for both BWM and NSM expression in wild-type worms 
were similarly expressed in hlh-1 mutants. Only four BWM/NSM 
shared genes were reduced by more than two standard deviations 
from their wild-type level in the mutant. However, hlh-8 up-regula- 
tion is not sufficient to explain muscle differentiation in the absence 
of hlh-1, because it is not synthetic muscle-lethal with hlh-1. 

At the level of circuit structure and molecular mechanism, 
why and how is hlh-8 switched on in the absence of hlh-1, and 
what does this imply about their relationship in normal muscle 
development? Negative regulation of hlh-8 by hlh-1 is likely to 
be by an indirect mechanism, partly because hlh-1 is known as 
a positive regulator of its direct targets. In addition, we detected no 
HLH-1 occupancy near hlh-8 or mls-1 via ChlP-seq, even at the 
most relaxed peak calling stringency. Our RNA data show that 
known positive regulators of hlh-8, such as unc-62, ceh-20, and 
mab-5, are all present in both wild-type and hlh-1 mutant animals, 
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making activation of hlh-8 highly plausible when negative regu- 
lation mediated indirectly by hlh-1 is relieved (Harfe et al. 1998a). 

Drawing on our data and additional studies, we propose 
a specific model for regulation of hlh-8 by hlh-1 (Fig. 5). From the 
synthetic PAT phenotype analysis, ceh-20, lin-39, and grh-1 were 
identified as strong genetic hlh-1 interactors. By independent cri- 
teria, each of these is also a candidate to help activate hlh-8. grh-1 
positively regulates mab-5 (Venkatesan et al. 2003), and mab-5/unc- 
62 /ceh-20 positively regulates hlh-8 in NSM (Liu and Fire 2000). 
Furthermore, we found that egl-lSa and grh-1 are up-regulated in 
our hlh-1 mutants (Fig. 5A), and there is additional genetic pre- 
cedent in Drosophila for interaction between mab-5 and egl-15 
(Zhong and Sternberg 2006). EGL-15/FGFR is necessary for proper 
sex myoblast (NSM) migration (Stern and Horvitz 1991). The pri- 
mary splicing variant in NSM, EGL-15 a, is down-regulated by SUP- 
12, which destroys EGL-15a but not EGL-15b, the primary splicing 
variant in BWM (Kuroyanagi et al. 2007). We found that the EGL- 
15a RNA splice isoform is up-regulated in hlh-1 mutants, sup-12 
expression depends on HLH-1 activity, and sup-12 has a high 
confidence HLH-1 occupancy domain (Fig. 5B). We therefore 
suggest that in normal BWM, HLH-1 drives SUP-12 to down-reg- 
ulate EGL-15a and grh-1 f while in hlh-1 mutant muscle, SUP-12 is 
not expressed, and EGL-15a and grh-1 increase, thus activating 
mab- 5 /unc-62 /ceh-20 and leading to up-regulation of hlh-8 and 
some of its NSM target genes (Fig. 5C). This should especially favor 
target genes normally expressed in both NSM and BWM, since 
collaborating factors from BWM are present. 

There is also evidence for reciprocal repression of BWM by 
NSM, since hlh-8 mutants have an unstable and sometimes higher 
number of BWM cells and their sex-specific muscles disappear 
(Corsi et al. 2000, 2002). Up-regulation of hlh-8 in BWM in the 
absence of hlh-1 is reminiscent of the connection reported in the 
post-embryonic M lineage (Harfe et al. 1998a), and regulation of 
normal M lineage development, which generates both NSM and 
BWM, might account for the crosstalk we see in BWM upon hlh-1 
mutation. 

Defining "archetypal" hlh-1 target genes and their candidate 
CRMs 

We distilled a set of 78 genes and associated candidate ds-acting 
regulatory modules that meet four criteria for being "archetypal" 
regulatory and molecular hlh-1 targets: (1) They are expressed 
preferentially in BWM; (2) they display significantly reduced RNA 
levels in hlh-1 mutants; (3) they have HLH-1 occupancy at one or 
more sites in our ChlP-seq data; and (4) the HLH-1 occupancy re- 
gion contains one or more instances of the extended myogenic 
"octa E-box" (AACAGCTG) (Supplemental Table S7; Fig. 3C). An 
additional 154 genes satisfy criteria 2, 3, and 4 but are also strongly 
expressed in tissues other than BWM. These genes apparently de- 
pend on HLH-1 in the context of muscle and on other factors 
elsewhere. 

Membership in our list of candidate hlh-1 BWM CRMs did not 
use DNA sequence conservation as an initial criterion, since re- 
cently evolved active instances may exist and will be pertinent to 
the network. This allowed us to ask if the candidate canonical 
muscle CRM group is preferentially conserved, and it was. Prefer- 
ential conservation encompassed a region of ±150 bp relative to 
the HLH-1 ChIP peak (Fig. 4D), showing that this group of candi- 
date CRMs has been under pressure to function. The muscle octa- 
Ebox is even more highly conserved than the surrounding domain, 
suggesting that it drives binding of functional consequence. 



The archetypal muscle HLH-1 targets were defined without 
using unc-120 data, yet they are highly enriched in unc-120 regu- 
latory targets. Thus, 56% of the hlh-1 archetypal loci are positively 
regulated by unc-120, while only 18% of all genes are regulated by 
unc-120. Similarly, 52% of genes satisfying criteria 2, 3, and 4 were 
positive regulatory targets of unc-120. Finally, of the genes in the 
archetypal group that are also unc-120-reguldted, 20% contain an 
85% match instance to GAGACGCAGA within their HLH-l-oc- 
cupied region. 

The archetypal HLH-1 target genes and candidate CRMs have 
been defined by intentionally stringent multiple-measurement 
intersection to help learn the defining and shared characteristics of 
BWM regulation. It is, therefore, an underestimate of the BWM 
group and highlights the important role hlh-1 plays in regulating 
genes not specific to the BWM group. 

HLH-1 occupancy versus hlh-1 regulatory impact 

Expression of hlh-1 is specific to the bodywall muscle system, the 
phenotype of hlh-1 null mutants is myogenic, and hlh-1 orthologs 
across metazoan phyla regulate muscle development and differ- 
entiation. This relative simplicity made it possible to address some 
questions about factor occupancy that have been difficult in sys- 
tems with larger genomes and more complex organization. First, 
factor occupancy alone, as measured by HLH-1 ChlP-seq, is a per- 
missive condition for regulating gene expression, but it is not 
powerfully predictive of regulatory activity. In isolation, HLH-1 
occupancy had low specificity for hlh-1 -dependent RNA expres- 
sion at nearby genes or TSS. HLH-1 occupancy in an independent 
study (Lei et al. 2010) had almost identical specificity (9% in Lei 
et al. [2010] versus 10% in this analysis). The Lei et al. (2010) study 
also concluded that many sites were upstream of nonmuscle genes. 
Their conclusion that binding was not predictive of enhancer ac- 
tivity mirrors our conclusion that it is not predictive of regulatory 
activity (Lei et al. 2010). Similarly, the majority of mouse MyoD 
occupancy sites are located closest to nonmuscle-specific genes 
(Cao et al. 2010). Substantial technical issues surrounding assay 
sensitivity, combined with consequences from assigning binding 
regions to genes by an overly simple proximity algorithm, are 
probably responsible for some lack of predictive power. Never- 
theless, the data are most consistent with a majority of detected 
HLH-1 occupancy binding events having no regulatory effect or an 
effect too small to measure. Where hlh-1 regulation is detected, the 
majority of it is associated with genes that are not specific for BWM 
alone. 

From HLH-1 ChlP-seq regions, we refined the HLH-1 binding 
motif preferentially affiliated with HLH-1 -regulated loci (AACA 
GCTG) and showed it is the dominant, centrally located driver for 
HLH-1 genome occupancy. The distribution of additional novel 
motifs from HLH-1 -occupied regions, among the functionally dis- 
tinct target gene groups, provided insights into BWM/NSM cis- 
regulatory logic. The GAGACGCAGA has a high rate of co-occurrence 
with the octa-Ebox and presence in HLH-1 ChIP regions. It is 
a plausible candidate to bind UNC-120 or an intimate UNC-120 
collaborator, as the motif is preferentially concentrated near genes 
positively regulated by unc-120, significantly enriched in genes 
annotated for NSM and BWM (Fig. 4E; Supplemental Fig. S4), and 
present in more than half of archetypal BWM candidate CRMs. 
Several other discovered motifs also colocalize with HLH-1 bind- 
ing, but more weakly, and these are reasonable candidates to col- 
laborate with HLH-1 in other target gene subgroups (Supplemental 
Fig. S3). For example, CGnnGCGAGACCC is enriched near genes 
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positively regulated by hlh-l but negatively regulated by unc-120. 
This pattern is expected for BWM genes that must be turned off in 
NSM, a group with which it has significant overlap. In contrast, 
GCCGatttGCCG is selectively depleted near genes positively reg- 
ulated by either hlh-l or unc-120, suggesting that this motif me- 
diates a function that is orthogonal to muscle differentiation (Fig. 
4E). None of the newly discovered motifs closely resemble known 
transcription factor binding sites, although some bear resemblance 
to previously reported muscle-associated motifs (Guhathakurta 
et al. 2002; GuhaThakurta et al. 2004), and they are all candidates 
to bind hlh-l interacting factors from the RNAi screen. 

Methods 

Additional details may be found in the Supplemental Material. 
General methods and strains 

We obtained C. elegans strains N2, PD4605 (hlh-l(cc561)), and 
RW364 (unc-120(st364)) from the Caenorhabditis Genetics Center 
(CGC) and cultured them using standard methods (Brenner 1974). 
To increase the proportion of muscle, we chose to knock down 
early specification genes (mex-3, elt-1, and skn-1) to permit muscle 
specification (Fig. 1C; Supplemental Material). 

RNAi feeding 

Bacteria (Ahringer Lab RNAi Library) were used for control (HT1 15) 
and mex-3 RNAi feeding, in addition to RNAi knockdown for the 
synthetic PAT screen. The elt-1, mex-3 and skn-1 inserts were fused 
in a single vector for triple RNAi feeding (Gouda et al. 2010). 
Knocking down multiple RNA transcripts can suffer from poor 
efficiency (Gonczy et al. 2000), but our concatenation technique 
maintained high penetrance. Effectiveness was measured in ali- 
quots from each biological replicate, with 100% of animals being 
affected by RNAi (100% embryonic-lethal, with 0% making it to 
the comma stage morphologically). Costaining with phalloidin 
and DAPI revealed significant enrichment of myosin in N2 worms, 
though not every cell was converted to muscle. Synchronized 
worms were grown on seeded NGM special plates with IPTG and 
carboxy-penicillin at 15°C until the L4 stage, and then at 25°C 
until gravid adults began egg-laying. Gravid adults were bleached, 
and the eggs were shaken at 25 °C in S-complete media (five 
embryos/|jiL) for 400 min to ensure muscle differentiation but 
avoid tissue necrosis (Fig. IB). 

ChlP-seq 

Immunoprecipitation with an existing anti-HLH-1 polyclonal 
antibody (Lei et al. 2009) was performed in N2 and hlh-l(cc561) 
animals with a modified protocol (Weinmann and Farnham 2002). 
While not a null, the mutation effectively destroys HLH-1 function 
(Lei et al. 2009), probably through non-sense mediated decay 
(Harfe et al. 1998a), and no signal was seen above background. 
Embryos were freeze-cracked in 2% formaldehyde on dry ice five 
times, fixed for 30 min, and then quenched for 5 min. The em- 
bryos were washed, lysed, and sonicated (Misonex, output 3.5) 
with a microtip for 15 30-sec pulses with 1-min intervals. Ten 
percent of the sample was set aside as control. The antibody was 
added to the chromatin prep and allowed to mix for 16 h at 4°C. 
Four sequential aliquots of 200 |xL of magnetic beads (Invitrogen 
Dynabeads M-280 Sheep anti-Rabbit IgG) were then added for 4 h 
to extract the antibody. Beads were washed, the complexes eluted, 
and the DNA organically purified and quantified (Invitrogen Qubit 



Fluorometer). For ChIP sequencing, the average number of reads 
was 1 7 ± 2 million reads, the average number of unique reads was 
12 ± 1 million reads, and quality control failed on an average of 
0.5 ± 0.1 million reads (Supplemental Table S8). 

RNA-seq 

For RNA-based sequencing, embryos were flash-frozen in TRIzol 
(Sigma) and freeze-cracked on dry ice five times. The embryos were 
passed through a 21 G needle and a 25 G needle (10X each) to 
shear the eggshell. The RNA was precipitated, treated with Turbo 
DNase (Ambion), and dT-purified (Invitrogen Dynabeads Oligo- 
dT). External quantification standards were spiked in to the mRNA, 
which was then fragmented to an average length of 200 nt by 
heating at 94°C in the presence of Mg ++ for 90 sec. The fragmented 
mRNA was then random-primed with hexamers for reverse tran- 
scription first strand synthesis, followed by nick translation second 
strand synthesis using a double-stranded cDNA synthesis kit 
(Invitrogen). Comparisons were made between two biological 
replicates that were independently sequenced for each condition, 
except for untreated hlh-l mutants, for which only a single pooled 
sample was sequenced. For RNA sequencing, the average number 
of reads was 26 ± 9 million reads, the average number of unique 
reads was 20 ± 7 million reads, and the average number of repeat 
reads was 7 ± 4 million reads (Supplemental Table S8). Quality 
control failed on an average of 0.5 ± 0.3 million reads per se- 
quenced lane. 

Library making and sequencing 

The standard single amplification Illumina library-making protocol 
was used, including end repair, adaptor ligation, gel purification, 
and PCR amplification. Flowcell generation and sequencer running 
followed the Illumina protocol. All sequencing data is publically 
available. All genomic data are DNA-sequence-based and publicly 
available through GEO (GSE28561, GSE28562, GSE28563). 

RNAi feeding for synthetic lethal screening 

Bacteria from the OpenBioSystems RNAi library and the Ahringer 
RNAi library were used for RNAi feeding of L4 animals for 36 h at 
25 °C. Adults were then transferred to fresh plates for egg-laying for 
4 h at 25°C. Adults were removed, and embryos were allowed to 
develop for 18-24 h prior to scoring. Embryos were scored for de- 
velopmental progression using a dissecting microscope. The stage 
of developmental arrest in embryonic lethal worms was noted as 
during the twofold stage (Pat) or otherwise. 

Data analysis 

WormBase release WS190 was used for all analysis. Read mapping 
and read processing were performed with Bowtie and ERANGE 
(Pepke et al. 2009). Python was used to perform calculations de- 
scribed in the text. Genes associated with stress response (e.g., heat 
shock genes) were monitored for signs of damage or stress. We 
looked for enriched motifs near the ChlP-identified binding sites 
using MEME on sequences within various radii of the binding site. 
A greedy algorithm-based motif finder reproducibly identified the 
major nonrepeat motifs found with MEME. Enrichments were 
determined by x 2 and hypergeometric statistical analysis. 

Data access 

All sequencing data have been submitted to the NCBI Gene Ex- 
pression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/) for 
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"HLH-1 binding in muscle-enriched embryos and RNA expression 
in muscle-enriched embryos across different mutations" (accession 
number GSE28563, including GSM707199-GSM707213), "Genome- 
wide maps of HLH-1 binding in muscle-enriched embryos" (acces- 
sion number GSE28561, including GSM707199-GSM707202), and 
"Genome-wide RNA expression in muscle-enriched embryos across 
different mutations" (accession number GSE28562, including 
GSM707203-GSM707213). 
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